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CROSS-REFERENCES TO RELATED APPLICATIONS 
This application claims priority from Provisional Appln. No. 60/070,650, 
filed January 7, 1998, the disclosure of which is incorporated herein by reference. 

1 0 BACKGROUND OF THE INVENTION 

The virtual interface architecture (VIA) has been jointly developed by a 
number of computer and software companies. VIA provides consumer processes with a 
protected, directly accessible interface to network hardware, termed a virtual interface. 
VIA is especially designed to provide low latency message communication over a system 

15 area network (SAN) to facilitate multi- processing utilizing clusters of processors. 

A SAN is used to interconnect nodes within a distributed computer system, 
such as a cluster. The SAN is a type of network that provides high bandwidth, low 
latency communication with a very low error rate. SANs often utilize fault-tolerant 
technology to assure high availability. The performance of a SAN resembles a memory 

20 subsystem more than a traditional local area network (LAN). 

The VIA is described in the Virtual Interface Architecture Specification, 
Draft Revision 1.0, December 4, 1997. The VI Architecture is comprised of four basic 
components: Virtual Interfaces, Completion Queues, VI Providers, and VI Consumers. 
The VI Provider is composed of a physical network adapter and a software Kernel Agent. 

25 The VI Consumer is generally composed of an application program and an operating 
system communication facility. The organization of these components is illustrated in 
Figure 1. 

A VI is depicted in Fig. 2 and consists of a pair of Work Queues: a send 
queue and a receive queue. VI Consumers post requests, in the form of Descriptors, on 
30 the Work Queues to send or receive data. A Descriptor is a memory structure that 
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contains all of the information that the VI Provider needs to process the request, such as 

pointers to data buffers. 

The VI Provider is the set of hardware and software components 

responsible for instantiating a Virtual Interface. The VI Provider consists of a network 
5 interface controller (NIC) and a Kernel Agent (KA). 

The VI NIC implements the Virtual Interfaces and directly performs data 

transfer functions. The NIC provides an electro-mechanical attachment of a computer to 

a network. Under program control, a NIC copies data from memory to the network 

medium, i.e., transmission,, and from the medium to the memory, i.e., reception. 
10 The Kernel Agent is a privileged part of the operating system, usually a 

driver supplied by the VI NIC vendor, that performs the setup and resource management 

functions needed to maintain a Virtual Interface between VI Consumers and VI NICs. 

These functions include the creation/destruction of Vis, VI connection setup/teardown, 

interrupt management and/or processing, management of system memory used by the VI 
15 NIC, and error handling. VI Consumers access the Kernel Agent using standard operating 

system mechanisms such as system calls. Kernel Agents interact with VI NICs through 

standard operating system device management mechanisms. 

The VI Architecture requires the VI Consumer to identify memory used 

for a data transfer prior to submitting the request. Only memory that has been registered 
20 with the VI Provider can be used for data transfers. This memory registration process 

allows the VI Consumer to reuse registered memory buffers, thereby avoiding duplication 

of locking and translation operations. Memory registration also takes this processing 

overhead out of the performance-critical data transfer path. 

Memory registration enables the VI Provider to transfer data directly 
25 between the buffers of a VI Consumer and the network without copying any data to or 

from intermediate buffers. 

Memory registration consists of locking the pages of a virtually contiguous 

memory region into physical memory and providing the virtual to physical translations to 

the VI NIC. The VI Consumer gets an opaque handle for each memory region registered. 
30 The VI Consumer can reference all registered memory by its virtual address and its 

associated handle. 
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Memory is registered with the VI NIC for two reasons: 

1) to allow the NIC to perform virtual to physical address translation 

2) to allow the NIC to perform protection checking. 

Consumers are able to use virtual addresses to refer to VI Descriptors and 
5 communication buffers. The VI NIC is able to translate from virtual to physical addresses 
through the use of its Translation and Protection Table (TPT). The TPT of the NIC 
described in the VIA Specification resides on the NIC in order to assure fast, non- 
contentious access and because it is accessed during performance critical data movement. 
A TPT and method of accessing the TPT are depicted in Fig. 3. The fields of each TPT 
10 entry are: 

a) a valid indication bit 

b) a physical page address 

c) a protection tag 

d) an RDMA Write Enable Bit 
15 e) an RDMA Read Enable Bit 

f) a Memory Write Enable Bit 

The size of the TPT is configurable. There is one entry in the TPT for each 
page that can be registered by the user. A memory region of N contiguous virtual pages 
consumes N contiguous entries in the TPT. 

20 When a memory region is registered with the NIC, the Kernel Agent 

allocates a contiguous set of entries from the TPT and initializes them with the 
corresponding physical page addresses and protection tag specified by the process that 
registered the memory region. The protection tag specified by the process when it creates 
a VI is stored in the context memory of the VI. The NIC has access to the protection tag 

25 in both of these areas, allowing it to compare these values to detect invalid accesses. 
Page sizes larger than 4KB are supported and page size may differ among nodes of the 
SAN. 

The above-described implementation of the TPT has several 
disadvantages. If TPT entries are allowed to exist anywhere in memory, an application 
30 could set-up bogus TPT entries which point to any physical address. A RDMA Write 
descriptor could then be set up, given appropriate Virtual Address and Memory Handle to 
use this bogus TPT entry and scribble anywhere in memory. The standard solution is to 
limit the locations of legal TPT entries. The requirement of allocation of contiguous 
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memory to facilitate bounds checking consumes a large amount of memory. Another * 
problem resulting from the standard solution is that it may lead to fragmentation of entries 
in the TPT which can result in a failure when attempting to find multiple consecutive 
entries required when registering large memory regions. 
5 The fragmentation problem is illustrated in Fig. 4 which depicts an 

exaggerated example where the TPT range is limited to only eight entries. There are 
three active registered memory regions, with TPT owner IDs X, Y, and Z, which 
differentiate the registered memory regions. An application cannot register a new two 
page memory region, Mem Region 4, because, due to previous fragmentation of the TPT, 

10 no two TPT entries are contiguous. Thus, Mem Region 4 cannot be registered even 
though there are three available entries in the TPT. 

If the Memory Handles could be reassigned, then larger contiguous sets of 
free locations could be found. Unfortunately, this is not possible because the Memory 
Handles returned to the applications earlier are already in use in descriptors and it would 

15 be undesirable to stop VI processing and update all the descriptors. 

SUMMARY OF THE INVENTION 
According to one aspect of the invention, a two-level look-up scheme 
utilizes a Memory Handle Index to obtain an index into a table of Memory Handles, the 
20 Memory Handle Table (MHT), used for accessing the TPT. 

According to another aspect of the invention, an application receives a 
Memory Handle Index when it registers memory. The TPT entries for the registered area 
of memory can be moved and the Memory Handles reassigned without requiring the 
descriptors, which use the Memory Handle Index, to be updated. 
25 According to another aspect of the invention, the TPT can be stored in any 

place in memory and fields for base/bounds checking are included in each MHT entry. 

According to another aspect of the invention, the TPT can be 
defragmented by moving fragmented entries to free locations and updating the Memory 
Handle to point to the new location. Since descriptors in use hold Memory Handle 
30 Indices, the descriptors do not need to be updated. 

Other features and advantages of the invention will be apparent in view of 
the following detailed description and appended drawings. 
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BRIEF DESCRIPTION OF THE DRAWINGS 
Fig. 1 is a block diagram of the Virtual Interface Architecture (VIA); 
Fig. 2 is a block diagram of a Virtual Interface (VI); 
Fig. 3 is a block diagram of the VIA address translation scheme; 
Fig. 4 is a block diagram depicting a fragmented TPT; 
Fig. 5 is a schematic diagram of a preferred embodiment of the two-level 
address translation mechanism; 

Fig. 6 is a block diagram of the MHT entry format and the TPT entry 

format; 

Fig. 7 is a block diagram of a fragmented TPT utilizing the two level look- 
up table scheme; and 

Figs. 8A-8C depict the steps of defragmenting the TPT. 

DESCRIPTION OF THE SPECIFIC EMBODIMENTS 

A preferred embodiment of the invention will now be described with 
reference to Fig. 5 which depicts a novel two-level scheme for accessing a translation 
protection table (TPT) implemented by the network interface card (NIC) and kernel agent 
(KA) of the VI consumer as depicted in Fig. 1 . 

Applications access memory using virtual addresses 50 and Memory 
Handle Indices (MHI) 52. The NIC provides the translation to physical addresses. The 
MHI value 52 is returned from the VI User Agent during memory registration. 

The MHI 52 is an offset into a first level table called the Memory Handle 
Table 54. This first level table contains the Memory Handles (MH) 55. The MH is 
subtracted from the Virtual Page Number (VPN) 50 to generate a pointer into the second 
level Translation and Protection Table (TPT) 56 . This pointer is called the Pseudo 
Address (PSA). Note that in the VIA Specification and Fig. 3 this pointer is denoted the 
"protection index". The TPT holds the Physical Page Number (PPN). The MHI is 20 bits, 
allowing for up to 1M Memory Handles. The present embodiment requires the Memory 
Handle Table to reside in physically contiguous memory which begins at the Memory 
Handle Base register value. 

Each Memory Handle Table entry is 8 bytes. The Memory Handle is 32 
bits allowing 4G TPT Entries. Each TPT entry is 8 bytes. Protection checks which limit 
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the start and extent of TPT entries force them to begin in the lower 8GBytes of memory 
because of the size of the TPT Start field in the present embodiment. 

The VIA Specification uses terminology which is different from that used 
with the presently described NIC regarding the Memory Handle, since the VIA 
5 Specification describes only a one table lookup implementation whereas the present NIC 
uses a 2 table lookup implementation for calculating physical addresses. Both the 
implementation described in the VIA specification and the preferred embodiments 
subtract the Memory Handle from the virtual address to obtain the pseudo-address(PSA). 

But the NIC of the preferred embodiment does not get the Memory Handle 

10 from the descriptor, it gets the memory handle from the 1st level MHI table 54 which is 
pointed to by the Memory Handle index 52 which is gotten from the descriptor. 
Therefore, in the preferred embodiment, the Memory Handle Index 52 is returned by the 
VI User Agent RegisterMem call, but in VIA VI User Agent terms the Memory Handle is 
returned by the VI User Agent RegisterMem call. As noted, the Memory Handle Index 

15 (MHI) and Memory Handle (MH) are not the same even though the VIA implementation 
and the preferred embodiment describe the same VI User Agent call (RegisterMem) as 
returning the value the implementation needs. 

Fig. 6 depicts the Memory Handle Table entry format 60 and TPT entry 
format 70. The TPT Start field 62 is a 4K byte physical address pointer to the beginning 

20 of the block of TPT entries allocated as part of the memory registration. This field is 21 
bits in width, requiring TPT entries to start in within the lower 8G bytes of memory. The 
TPT Extent field 64 indicates how many 4K byte pages of TPT entries are valid for this 
registration. Each page can hold 512 TPT entries. The TPT Extent field is 10 bits in 
width, allowing up to 1023 pages, each page containing 512 TPT entries. Therefore, the 

25 maximum memory a single memory registration can handle is 1 023 x 5 1 2 x Pegasus. For 
a Pegasus of 4Kbytes, this is 2G bytes - 2M bytes. 

All Memory Handle table entries must be appropriately programmed by 
the Kernel Agent. Any unused entry must have its TPT Extent field set to all zeros. The 
second level TPT Entries indicated by the TPT Start, TPT Extent pair must also be 

30 programmed by the Kernel Agent. Unused entries must have their valid bits (V) cleared, 
this includes unused entries beyond those used for the memory registration, but within the 
same 4K byte page as the last valid entry. 
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Referring back to Fig. 5, the use of the TPT start and Extent fields to 
implement base/bound checking will now be described. The Pseudo-Address (PSA) is a 
pointer into the TPT. The magnitude of the PSA generated for a particular Memory 
Handle is compared to the TPT start field and the sum of the TPT Start and Extent fields 
(which sum gives the bound of the TPT). If the generated PSA is less than TPT start or 
greater than the sum of TPT Start and Extent than a TPT Extent Violation is signaled. 

The use of the two level-level accessing scheme to rearrange TPT entries 
to eliminate fragmentation will now be described with reference to Fig. 4, 7, and 8A-8C. 
As previously described, a new region of memory must be registered utilizing contiguous 
entries in the TPT. Fig. 4 depicts a TPT having three unused entries, but due to previous 
assignment of Memory Handles, no two entries are contiguous and a new memory region 
of two pages cannot be registered. 

Fig. 7 depicts the same memory regions and TPT entries of Fig. 1, but 
which utilize the two-level table look-up scheme described above. Thus, a MHI, A H pi to 
A H p3, has been returned for each memory region registered. Each of these MHIs obtains 
an MH from the MH table 54, which, when combined with the Virtual Address provided 
by an application, form a PSA (A v - A H ) that accesses the correct entry in the TPT. 

The defragmentation of the TPT to provide contiguous entries will now be 
described with reference to Figs. 8A-8C. In Step 1, Fig. 8 A, is to copy the TPT entry(ies) 
to be relocated. In this case the entry from entry[6] is copied to entry[3]. 

Next, in Step 2, Fig. 8B, the Memory Handles for the relocated TPT 
entry(ies) are reassigned. In this case, the MH that previously formed a PSA pointing to 
entry[6] is changed to an MH that forms a PSA pointing to entry[3]. Note that the 
reassigned MH is still located in the same entry in the Memory Handle table so the MHI 
indexes the correct MH to access the correct translation data. Thus, the entries in the TPT 
can be moved without having to update the descriptors. 

Finally, in Step 3, a new handle, AH*, is added which forms a PSA 
pointing to entry[5] and the translation data for Mem Region 4 is stored in entry[5] and 
entry [6] of the TPT. The MHI A H p4 is returned to the application registering Mem 
Region 4. 

In the preferred embodiment, the KA copies the Mem Region 3 data to the 
new TPT entry and the changes the data in the Memory Handle table to access the newly 
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copied entry. This freed up three consecutive TPT entry locations which can then be used 
for the newly registered Mem Region 4. 

The invention has now been described with reference to the preferred 
embodiments. Alternatives and substitutions will now be apparent to persons of skill in 
5 art. For example, the particular size of the fields described are not critical to the 

invention. In addition, different algorithms for combining a Memory Handle and virtual 
address could be utilized. Accordingly, it is not intended to limit the invention except as 
provided by the appended claims. 
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WHAT IS CLAIMED IS: 



1 1 . A memory registration and two-level address translation and 

2 protection method implemented by a network interface card (NIC) and kernel agent 

3 forming a virtual interface provider, said method comprising the steps of: 

4 providing a memory handle index corresponding to each region of memory 

5 registered; 

6 maintaining a memory handle table with each entry accessed by a memory 

7 handle index and storing a memory handle; 

8 maintaining a translation and protection table including a plurality of TPT 

9 entries, each TPT entry storing a physical address which is the translation of a virtual 

10 address utilized by a virtual interface consumer to access registered memory; 

1 1 providing a first virtual address to be translated, with the first virtual 

12 address included in a first registered memory region, and also providing a first memory 

13 handle index corresponding to the first registered region; 

14 utilizing the first memory handle to access an entry in the memory handle 

1 5 table holding a first memory handle; 

16 combining the first memory handle and the first virtual address to form a 

17 pseudo-address for accessing a first entry in the TPT holding a first physical address that 

1 8 translates the first virtual address. 

1 2. The method of claim 1 further comprising the steps of: 

2 including start and extent fields in each entry of the TPT; 

3 after generating the first pseudo-address to access the TPT: 

4 comparing the first pseudo-address to the start field and indicating an 

5 extent violation if the magnitude of the of the start field is greater than the magnitude of 

6 the first pseudo-address; 

7 comparing the first pseudo-address to the sum of the start and extent fields 

8 and indicating an extent violation of the magnitude of the start and extent fields is less 

9 than the magnitude of the first pseudo-address. 



> 
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1 3. A method for defragmenting a translation protection table 

2 comprising the steps of: 

3 providing a translation protection table (TPT), having a plurality of TPT 

4 entries, with each TPT entry holding translation data for a virtual address included in a 

5 registered memory region; 

6 providing a memory handle table (MHT) 3 having a plurality of MHT 

7 entries, each MHT entry associated with a registered memory region, with each MHT 

8 entry holding a memory handle, with the memory handle used in conjunction with a 

9 virtual address to access the TPT entry holding translation data for the virtual address; 

1 0 providing a unique memory handle index for each memory region 

1 1 registered, with each unique memory handle index for accessing the entry of the memory 

1 2 handle table holding the memory handle for accessing TPT entries holding translation 

1 3 data for virtual addresses in the registered memory region; 

14 storing translation data for each page of a first registered memory region 

15 as the content of contiguous entries of the translation protection table, with the first 

16 memory region associated with a first memory handle index; 

17 if sufficient unused entries for storing translation data for a second 

18 memory region, associated with a second memory handle index, exist in the TPT but the 

19 entries are not contiguous: 

20 copying contents of fragmented entries, storing translation data for 

21 the first registered memory region, to selected unused entries in the TPT, to form a 

22 contiguous region of unused TPT entries for storing translation data for the second 

23 memory region; 

24 updating the memory handle, stored in the MHI table entry indexed 

25 by the first MHI, to access the selected TPT entries now storing translation data 

26 for the first registered memory region 

27 storing translation data for the second memory region in the 

28 contiguous region of TPT entries that previously stored translation data for the 

29 first memory region; 

30 storing a memory handle in the entry to the MHT entry accessed by the 

31 second MHI to access the contiguous region of TPT entries holding translation data for 

32 the second memory region. 
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1 4. A system for performing address translation that utilizes a memory 

2 handle index provided to a user application, with memory handle index associated with a 

3 memory region registered by the user application, and with the memory region 

4 comprising a plurality of contiguous virtual addresses, said system comprising: 

5 a memory handle table, having a plurality of MHT entries, with each MHT 

6 entry accessed by a unique memory handle index and holding a memory handle; 

7 a translation and protection table (TPT), having a plurality of TPT entries, 

8 with each TPT entry accessed by a TPT pointer and holding translation data for a virtual 

9 address in a registered memory region; 

10 pointer generating logic, responsive to a particular virtual address and a 

1 1 particular memory handle index provided by a user application, for combining a memory 

12 handle, accessed from the memory handle table by the particular memory handle index, 

1 3 with the particular virtual address to generate a particular TPT pointer that accesses 

14 translation data for the particular virtual address from the TPT. 
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Mem Region 2, 1 page, (A H P2l 

Mem Region 1,3 pages, [A H pi1 



TPT Entries : _ TPTBound 




- Changed or Added by the Kernel Agent 




Step 2 : Reassign Memory Handles for 
relocated TPT(s) 

Handle Table ^ 
Mem Region 4, 2 pages, [A H P4l: - _ v ^ 



Mem Region 3, 1 page, [A H p 3 ]:^i— 
Mem Region 2, Vpage, [A H p2l : ^-*tl 
*" l: — tZI 



Mem Region 1,3 pages, [A HP1 1: 




^_TPT Bound 



TPT Base 



Step 3 : Add New Handle(s) and TPT Entry(ies) 

Mem Region 4. 2 pages, (A H P4): 
Mem Region 3, 1 page, [A H P3l : - v 
Mem Region 2, 1 page, (A H P2) 

Mem Region 1 , 3 pages, (A,^) 

fry %c 



- Changed or Added by the Kemal Agent 
TPT Entries 




^_ TPT Bound 



— TPT Base 



g-rn - Changed or Added by the Kemal Agent 
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